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Abstract 

In the context of attractor neural networks, we study how the equilibrium 
analog neural activities, reached by the network dynamics during memory 
retrieval, may improve storage performance by reducing the interferences be- 
tween the recalled pattern and the other stored ones. We determine a simple 
dynamics that stabilizes network states which are highly correlated with the 
retrieved pattern, for a number of stored memories that does not exceed a*N, 
where a* S [0, 0.41] depends on the global activity level in the network and 
iV is the number of neurons. 

87.15 Neural networks, 05.20 Statistical mechanics 
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Attractor neural networks (ANN's) have been the subject of intensive study among physi- 
cists since the original paper of Hopfield The analogy between the thermodynamics 
of ANN's and of spin glasses has been used to interpret the associative processes taking 
place in neural networks in terms of collective nonergodic phenomena. The identification of 
attractors with the internal representation of the memorized patterns, though still an object 
of open discussion, has received some basic experimental confirmations |§ and represents 
one of the basic issues for the biologically motivated models currently under study. Several 
authors have studied ANN's composed of analog neurons instead of the discrete spinlike 
neurons of the original model [§-0] and have shown that such more realistic networks may 
perform as associative memories. In the present paper we will be concerned with the fol- 
lowing issue: assuming the interaction (synaptic) matrix to be the simple Hebb-Hopfield 
correlation matrix, we discuss how the storage performance of an ANN may depend on the 
equilibrium analog neural activities reached by the dynamics during memory retrieval. 

In both discrete and analog Hopfield-like attractor neural networks, the phase transition 
of the system from associative memory to spinglass is due to temporal correlations aris- 
ing from the static noise produced by the interference between the retrieved pattern and 
the other stored memories. The introduction of a suitable cost function in the space of 
neural activities allows us to study how such a static noise may be reduced and to derive 
a class of simple response functions for which the dynamics stabilizes the "ground-state" 
neural activities, i.e., the ones that minimize the cost function, for a macroscopic number of 
patterns. 

In what follows, we first give some basic definitions and successively do the following. 

(i) We study the ground states of a cost function E defined in the phase space £ of the 
neural activities and proportional to the sum of the squared overlaps of the network state 
with the stored memories except the retrieved one. 

(ii) We derive the associated gradient flow in £ and show that it converges to the ground 
state of E. 

(iii) We show that when the minimum of the cost function is zero there is a linear relation 
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between the afferent current and the activity at each site of the network. 

(iv) We determine a simple dynamics (which turns out to be characterized by a non- 
monotonic response function) stabilizing the ground state of the system when the number 
of stored memories is smaller than a*N, where a* G [0, 0.41] depends on the global activity 
level. 

The neural network model is assumed to be fully connected and composed of N neurons 
whose activities {Vj}j=i,jv belong to the interval [—1, 1]. The global activity of the network 
is defined by 

7 = T^E e * > (!) 

i 

where q = \Vi\ and we denote by £— [0,1]^ the space of all q. A macroscopic set of P = aN 
random binary patterns S = = ±l} i= i t N , A 4 = characterized by the probability 

distribution P(£f) = |#(£f — 1) + §^(£f + 1)> is stored in the network by means of the 
Hebb-Hopfield learning rule = Z^Li ^i^j for i ^ j and Ja = 0. 

At site i, the afferent current (or local field) hi is given by the weighted sum of the 
activities of the other neurons hi = J2j JijVj- We consider a continuous dynamics for the 
depolarization 7j at each site i [r/j(t) = —hit) + hi(t)], in which the activity of neuron 
i at time t is given by Vi(t) = f(Ii(t)), where / is the neuronal response function. No 
assumptions are made on /, except that it is such as to align the neural activity with its 
afferent current [xf(x) > for all x\. 

We consider the case in which one of the stored patterns, /i — 1 for example, is presented 
to the network via an external current which forces the initial configuration of the network, 
thus we take Vi = e^]. The current arriving at neuron i becomes 

^ = 7^ + ^EefEfe (2) 

which, in terms of the overlaps of the network state with the stored patterns = 
J2j £j£,j e ji reads 

hi = m^l + (^2 Zi m n ~ ae £^j ■ ( 3 ) 
3 



Notice that the global activity is equal to the overlap of the network configuration with the 
retrieved pattern (mi = 7). 

The first term in the right-hand side (rhs) of Eq. ([|) is the signal part, whereas the 
second term, when the are fixed and for iV large, is a Gaussian random variable with zero 
mean and variance a 2 = Y^n>i m f i an d represents the static noise part, or cross-talk, due to 
the interference between the stored patterns and the recalled one. In the following we will be 
interested in minimizing the overlaps of the network state with the stored memories /i ^ 1 
which are not recalled. If one succeeds in finding such states (m M = for all /i ^ 1), then 
the second term in the rhs of Eq. reduces to —ae^l, and the interference effect vanishes. 
We will see that this is indeed the case in a finite range of the parameter a and that it is 
also possible to derive a class of effective response functions realizing such a a minimization 
of the interference and thus leading to an improvement of the network storage capacity. 

(i) In order to study how the interference may be reduced, we define a cost function 
E(E, e) depending on the neural activities and the set of stored patterns, proportional to 
a 2 , i.e. to the sum of the squared overlaps of the network state with all the stored memories 
except the retrieved one 

and we study its ground states. If no constraints are present on e, the minimum is always 
E = for e = 0; obviously, in this case no information is obtained when one presents 
the pattern and therefore we impose the constraint 7 = K on the average activity, where 
K G [0, 1]. The geometrical picture of the problem is indeed very simple. In the space £ we 
have to find the vectors e as orthogonal as possible to the P — 1 vectors rf 1 = {^f = }, 
with the constraint 7 = K. If there exists (with probability 1) at least one e orthogonal to 
the P — 1 vectors satisfying the constraint, then we have E = 0. The subspace corresponding 
to the condition E = is connected since it is the intersection of P — 1 hyperplanes = 
and one hyperplane 7 = K. 

In order to determine the typical free energy, we compute (hiZ)-,, where (. . .)■= stands 
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for the average over the quenched random variables S and Z is the partition function at 
temperature T = 1/(3 given by Z = Tr ? {exp[— (3E(E, e}}5 (7(e) — K)}, using the standard 
replica method. Starting from the typical partition function of n replicas of the system 
{Z n )z, for n integer, we perform an analytic continuation for non-integer values of n, thus 



obtaining (InZ)w = lim„_ 



►0 



(Z n ) s -1 



n 



Each replica a (a = l,...,n) is characterized by its neural activities e* and by the 
order parameter Q a = J2i( e i) 2 , whereas the overlap between neural activities in two 
different replicas defines the other order parameters (for a < b) q ab = Yji^1 e \- We indicate 
with R a and r ab the conjugate parameters of Q a and q ab , respectively The typical free 
energy F per site is then given, in the thermodynamical limit, by F{j3) = —G{j3)/(5 where 
G{(3) = limjv^oo < lnZ(/9) > s /N. The free energy at zero temperature, F , gives the 
ground state of the system. G can be calculated using a saddle-point method that, once a 
replica symmetric (RS) ansatz q ab = q, r ab = r (for all a < b), and Q a = Q, R a = R (for all 
a) has been done, leads to 



G = min 

M 



l {rq + RQ + Ku)- " " 7 



a 
2 



2 V ^ ' 2 l + p(Q- q ) 

(5) 



\n[l +(3(Q-q)]+ J DClnlWC) 
where M. = {q, r, Q, R, u}, 



r„, r ,i?(C) = J dzexp -^-^z 2 - (I + v^C 



z 
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and L>C is the Gaussian measure d(g(() with g(Q = exp(— ( 2 /2) /^/2tt. Depending on the 
storage level a one obtains the following results. 

(a) For a < a (K) we have F = 0. The parameter q, which is the typical overlap between 
the activity configurations in two replicas such that the free energy vanishes, increases from 
q = K 2 at a = to q = Q at a = a . At a — a , the space of neural activities such that 
F = shrinks to zero, and F becomes positive. 

(b) For a > a (K), we have q = Q and thus, in the limit f3 — > 00, we introduce the new 
scaled variables r = p(3 2 , R + r = a(3, u = 2uj(3, and x = (3(Q — q). The order parameters 



are given by the following saddle-point equations: 

\ , (7) 
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(9) 



P= 0T^' ° = TT^ ' (10) 



with £ = cr /s/P J r Ci) Ci = ^1 \ff>- a o(K) (Fig. 1) is obtained in the limit x — > oo and the 
minimum F is given by F = aQ(l + x)~ 2 where Q and x are fixed by their saddle-point 
values. 

The condition of local stability of the RS solution with respect to small fluctuations in 
replica space has been calculated. It turns out to be verified at T = for all a, which is not 
surprising since, as previously noticed, the space of neural activities that minimize the cost 
function is connected. 

The calculation of the probability distribution of the activities in the ground state can 
be done with similar techniques and yields 



30 DC, 
v ( e ) = I r 7n exp 



R + r n (u . v 



(11) 
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where u, r, and R take their saddle-point values. Above the critical storage level a > ato, 
the probability distribution reads, for e G [0, 1], 

V(e)=H(( )5(e-l)+A c g(( 1 + eA c ) + H(-( 1 )5(e) , (12) 

where H(() = f£° Dz, A^ = ( — d and Co an d Ci are given as a function of a and K by 
Eqs. (|?D, (iKf). Notice the two 5 functions in and 1. 

For brevity, we do not report here the results [[TtJ concerning the case of discrete (three- 
states {±1,0}) neurons; we just mention that replica symmetry breaking is required. 



(ii) The next step regards the gradient flow associated with a smooth version of the 
energy function (f|), implementing a soft quadratic constraint for the global activity. The 
study of this gradient flow will allow us, on the one hand, to find the relation between 
activities and afferent currents in the ground state, and on the other, to check the outcome of 
the RS solution. We emphasize that this gradient flow does not correspond to the dynamics 
of the network - this point will be considered later in (iii) and (iv). 

The new cost function E\ can be written 

^(H, e 1 = ^E(E^% 1 e,) +^(l>-™) 2 > (13) 

where A > is a Lagrange multiplier. In the first term one recognizes the previous cost 
function (f|) whereas the second term is introduced in order to favor configurations with 
activity K. The ground state F\ of E\(E,e) can be calculated |1C| with the same methods 
as the ground state of (f|). For a < a (K) we have F\ = and 7 = K, while for a > a (K) 
the ground state Fx becomes positive, and we have 7 < K. By computing the gradient of 
E\(E, e), it is now easy to derive the flow in the £ space: in R N we have 

'e = -^VE x = r'{Ae + b) , r'>0 (14) 

where {Aij= — (j^^^ViVj - ^r)} and {6j=Ai^}, i,j = 1,N. Discretizing time, constraining 
the 6j to stay in the [0, 1] interval, and choosing r' — -, we arrive at the following local 
equation 

e l (t + l) = <j ) (^{X[K- 1 (t)}+ 1 (t)-hl(t)^ , (15) 

where (j>(x) = if x < 0, (j){x) = 1 if x > 1 and 4>{x) = x otherwise, j(t) is the global activity 
and hi(t) is the afferent current at site i. Since E x is a positive semidefinite quadratic form, 
every local minimum in [0, 1] N is also an absolute minimum in the same interval and hence 
the gradient flow converges to the ground-state of E x - In Fig. 2 we compare the ground state 
energy F\ computed analytically with that given by simulations of Eq. fll5]) for a network 
of iV = 1000 neurons. It shows a remarkable agreement between the analytic solution and 
the numerical simulations. 
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(iii) When F\ = (which implies 7 = iT) the minimum of the cost function defined on 
[0, 1]^ is also an absolute minimum of the same function defined on R N and therefore its 
gradient vanishes. This leads to a very simple relation between the activity and the stability 
Ai = h^i at each site [upon inserting in ( |i5|) 7 = K] 

K-A t = ae t , (16) 

which also coincides with the expression of the afferent currents, Eq. (H), in which m M = 
for fi > 1. The above relation yields straightforwardly the probability distribution of the A's 
V(Ai = A) = V\ti = (K — A) /a], which implies that, for a < ao(K), this distribution is 
bounded between A = K and A = K — a. For a > ao(K), due to the nonlinear constraint on 
the bounds of the activities, the flow fll4|) fll5|) reaches a fixed point which does not coincide 
with the minimum of E\ in R , and therefore the stabilities distribution is no longer given 
by Eq. ©. 

(iv) Under the initial assumption on the neuronal transfer function \xf(x) > 0], the 
condition for a ground-state configuration correlated with the presented pattern to be a 
fixed point of the dynamics is to have positive A's at all sites. This indeed happens if the 
storage level a satisfies a < a±(K) where a+(K) is the critical line identified by a^K) = 
min(K , aco(K)) and shown in Fig. 1. 

It follows from Eq. fll6|) that, when a < a±(K), the ground state activities are fixed 
points of the network dynamics with the (nonmonotonic) transfer function / 

sgn{h) if G [0, K — a] 

f{h) = { sgn(h)(K - \h\)/a if \h\ e[K-a, K] (17) 
if \h\ > K 

shown in Fig. 3. In the same figure we also report, for a < a±(K), the equilibrium distribution 
of the local currents obtained by numerical simulations performed on a network with such 
a dynamics. All the A's belong to the interval [K — a, K], as expected. It is worth noticing 
that, at equilibrium, only the region TZ = [-K, —{K — a)] U [K — a, K] plays a role; as far 
as the equilibrium properties are concerned, outside this interval the form of the transfer 
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function is arbitrary. In 7Z, the dynamical behavior induced by / corresponds to a regulation 
of the output activity at each site. The latter turns out to be proportional to the difference 
between the afferent current and a reference feedback signal equal to the global average 
activity. 

If the fixed points are stable, it means that for a < a*(K) a network with the discussed 
dynamics is capable of stabilizing a configuration with activity K highly correlated with the 
retrieved pattern: the optimal activity is K opt = 0.41 for which we have a±(K opt ) = 0.41. 
The stability of the fixed points is difficult to prove, since the distribution of the local 
currents has peaks at points where the derivative of the transfer function is discontinuous. 
We have checked numerically their stability for a < a+(K): in order to correctly initialize 
the system, we have used Eq. (|15|) to find one set of initial ground-state activities {ej(0)} 
and then took VJ(0) = £^(0). 

Usually, the critical capacity is defined as an upper bound for the presence of retrieval 
states correlated with the presented pattern. In the case of a sigmoid transfer function, the 
critical capacity is obtained with a finite a 2 and yields ~ 0.14 J7],[|. Here a* is derived as an 
upper bound for the presence of retrieval states with a 2 = 0. This means that a* is a lower 
bound for the critical capacity, which is expected to be higher in the region where the A's 
are strictly positive at all sites, i.e., K > a+ or K > 0.41. Interestingly enough, our results 
on the maximal storage capacity a*(K opt ) are very similar to an estimate obtained in || by 
a completely different method on a particular nonmonotonic transfer function. 



The question of the size of basins of attraction in such a network remains open [|Tl|] . Obvi- 
ously, the condition of local stability does not ensure that starting from an initial configura- 
tion highly correlated with a stored memory [as, for instance, {Vi(t = 0) = £f }, i = 1, . . . , N], 
the network will converge to a ground-state configurations belonging to the same memory. 
Preliminary numerical simulations show that the basins of attraction can be considerably 
enlarged if one uses a dynamical threshold 9(t) instead of K in Eq. (|TTD, determined at 
time t by the instantaneous global activity of the network [8(t) = j[t)], given by Eq. ([I]) at 
time t. Such dynamical nonmonotonic behavior might be seen as an effect of a regulatory 



mechanism of the global activity in the network, which in real cortical networks is supposed 
to be due to inhibitory interneurons. 
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